BSure 0.0.0.9000
BSure is a method for the robust inference of gene essentiality types for pooled CRISPR survival screens and for the assessment of screen quality and design of screens. It uses Bayesian methods to model explicitly measurement noise and levels of replicability. We obtain a set of possible values for the essentiality score, each of them with a probability of it being the real value, that is a posterior probability distribution of scores. This is illustrated in the figure below. The credible interval is an interval that contains the unknown true value of the essentiality with a probability of 95%. The higher the screen quality, replicability of the experiment and agreement of gRNAs, the narrower the credible interval, that is we learn more about the essentiality from a higher quality screen and can narrow down the range in which the true essentiality type must be contained. For details see Strauss and Parts, 2020. In preparation, and the examples and illustrations in this vignette.
Range of essentiality types: posterior distribution and credible interval
library(BSure)
To illustrate the method, we first load a subset from the transcriptome-wide screen in the HT29 cancer cell line published in Allen et al. (2019). The reduced dataset contains a mix of essential and nonessential genes.
data(counts_HT29_small)
We compute log-fold changes between counts from the knockout screens and the plasmid library.
colnames(HT29_small)
## [1] "sgRNA" "gene" "HT29_C902R1_P1D14"
## [4] "HT29_C902R2_P1D14" "HT29_C902R3_P1D14" "HT29_c903R1"
## [7] "HT29_c903R2" "HT29_c903R3" "HT29_c903R4"
## [10] "HT29_c903R5" "HT29_c903R6" "HT29_c904R1"
## [13] "HT29_c904R2" "HT29_c904R3" "HT29_controls"
#The last column is the control measurement. The first column contains the gRNAs
#and the second the names of the corresponding genes.
nc <- ncol(HT29_small)
counts <- HT29_small[,-c(1:2,nc)]#removing columns of gRNAs, genes and controls
controls <- HT29_small[,nc]
lfc_small <- compute_lfc_from_counts(counts=counts, controls=controls)
HT29_lfc_small <- cbind(HT29_small$sgRNA,HT29_small$gene,lfc_small)#required format
#for BSure algorithm:
#first column: gRNAs, second column: corresponding genes, third to last column:
#log-fold changes for all replicates
Now we can run the BSure algorithm. For the small dataset, we can run it on one compute core.
runBSure(lfc=HT29_lfc_small,save_file_name="HT29_lfc_small",min_tail_ESS=500)
and analyse the output.
load("HT29_lfc_small.rda")#save_file_name from above plus ".rda"
extracted_output_HT29 <- extract_from_output(output)
plot_credible_intervals(extracted_output_HT29,"HT29_small")
## quartz_off_screen
## 2
This created a folder named HT29_small, containing the following plots:
HT29_small_credible_intervals_as_function_of_mean.pdf
In this figure the credible intervals for the essentiality of all genes, defined by their upper bounds (brown line) and lower bounds (black line) were plotted along the y-axis and the mean estimates of essentiality for the corresponding genes were plotted along the x-axis. The smaller the credible intervals, the more precisely we know the true essentiality type of the gene. As we see in this figure, credible intervals tend to be narrower for nonessential genes (on the right, with mean essentiality types around 0) than for more essential genes (on the left, strongly negative mean essentiality types.) Narrow credible intervals are indicative of a reliable screen of good quality which allows robust assessment of essentiality types for all genes. This is an example of a good screen with acceptable credible intervals.
HT29_small_density_length_of_credible_interval.pdf
This figure illustrates the distribution of the lengths of the credible intervals across all genes.
|
|
|
HT29_small_Rhat_credible_interval.pdf and HT29_small_tail_ESS_credible_interval.pdf
The two figures above are technical figures to check if there were any problems with the algorithm running on the dataset and inferring reliable credible intervals. Problems would be indicated by correlations apparent from the figures above. There are no problems apparent in this case.
extracted_output_HT29 has the following output
sample summary: a data frame containing mean estimates of the essentiality (means), credible intervals (CI_lower_bounds for the lower bounds of the credible intervals, CI_upper_bounds for the upper bounds), prob_essential_I: probability of being essential of type I (see below for illustration), prob_essential_II: probability of being essential of type II (higher level for essentiality, see below)
prop_expensive_sampling and pro_very_expensive_sampling: a technical measure of the percentage of genes for which more expensive sampling methods were required.
Essentiality of type II is determined by comparing the posterior distribution of the gene to that of known core essential genes (Hart et al. (2017)) and core fitness genes (F. M. Behan et al. (2019)), for details see Strauss and Parts (2020), for illustrations see the examples further below.
To illustrate, essentiality types I and II, we now run BSure on a small subset of genes (vector_of_genes=genes_HT29_small[seq(1,length(genes_HT29_small),10)]), this time plotting the posterior distributions by setting the variable plot_folder_name to the name of a subdirectory where the plots will be saved.
genes_HT29_small <- unique(HT29_lfc_small[,2])
runBSure(lfc=HT29_lfc_small,save_file_name="HT29_lfc_small_sub",n_cores=1,min_tail_ESS=500,vector_of_genes=genes_HT29_small[seq(1,length(genes_HT29_small),10)],plot_folder_name="HT29_small_gene_distributions")
An example of a highly essential gene. It is essential of type II with probability 1, as its posterior distribution is shifted by less than 1/3 compared to the distribution of core essential genes.
This gene is essential of type I because of the distinctness of its distribution from that of nonessential genes. Its evidence for being essential of type II is lower, as only part of its distribution is shifted less than 1/3 to the right compared to the distribution of core essential genes.
For genome-wide screens we recommend running BSure in parallel on 5-25 cores. The following example applies the method to the full genome-wide HT29 screen, from which the small dataset above was an extract. The code below runs BSure and plots credible intervals and diagnostic figures as illustrated above for the small example dataset. As running this function on a genome-wide screen is computationally more expensive and because of current problems concerning the parallel package if used with RStudio in connection with R-4, the following code cannot be run from RStudio with R-4.0.0-R-4.0.2.
library(BSure)
nCores <- 25
data("counts_HT29")
dir.create("results_HT29")
nc <- ncol(counts_HT29)
lfc <- cbind(counts_HT29[,1:2],compute_lfc_from_counts(counts_HT29[,-c(1:2,nc)],counts_HT29[,nc]))
save_file_name <- "results_HT29/HT29_full"
runBSure(lfc,save_file_name,nCores,2000)
load(paste(save_file_name,".rda",sep=""))
extracted_output <- extract_from_output(output)
save(extracted_output,file=paste0(save_file_name,"_extracted_output.rda"))
plot_credible_intervals(extracted_output,"results_HT29")
The figures below compare for the screen analysed the following sets of genes: A) genes on a list of core essential (Hart et al. (2017)) and core fitness genes (F. M. Behan et al. (2019)). B) genes on a list of nonessential genes (Hart et al. (2014)), C) genes identified as essential of type I by the BSure method, D) genes identified as essential of type II by the BSure method. The following are characteristics of a screen of good power to distinguish between genes of different levels of essentiality. The main purpose of the scores below is across-screen comparison of screen quality and reliability. The suggested thresholds are therefore not hard cutoffs. In particular, if stricter FDR thresholds are used, then all of the four scores will be lower. Therefore, we recommend to using the default FDR of 0.05 here.
data("HT29_full_extracted_output")
extracted_gene_categories_HT29 <- extract_gene_categories(HT29_full_extracted_output)
extracted_gene_categories_HT29$proportion_core_essential_in_essential_II
## [1] 0.7372881
extracted_gene_categories_HT29$proportion_core_essential_in_essential_I
## [1] 0.9661017
extracted_gene_categories_HT29$proportion_non_essential_genes_in_essential_II
## [1] 0.01742627
extracted_gene_categories_HT29$proportion_non_essential_genes_in_essential_I
## [1] 0.1045576
extracted_gene_categories_HT29$score
## [1] 0.7394183
We repeat with a stricter FDR level. This leads to extremely confident sets of essential genes for both types, but high false negative rates, and therefore most core essential genes are not included in this list. We recommend this approach to determine differentially essential genes across cell lines, see Section 4 below, but not for general assessment of screen quality.
data("HT29_full_extracted_output")
extracted_gene_categories_HT29_strict <- extract_gene_categories(HT29_full_extracted_output,10^-3)
extracted_gene_categories_HT29_strict$proportion_core_essential_in_essential_II
## [1] 0.5367232
extracted_gene_categories_HT29_strict$proportion_core_essential_in_essential_I
## [1] 0.1384181
extracted_gene_categories_HT29_strict$proportion_non_essential_genes_in_essential_II
## [1] 0.00536193
extracted_gene_categories_HT29_strict$proportion_non_essential_genes_in_essential_I
## [1] 0.002680965
extracted_gene_categories_HT29_strict$score
## [1] 0.269885
We load the processed output for screens on the K562 cell (Peets et al. (2019)) line with different Cas9 activity levels and different experiment coverage.
data("resultsK562_highest_extracted_output") #screens with highest experiment coverage levels (160x-180x), lower Cas9 activity, 3' readout
data("resultsK562_high_extracted_output") #120x-130x experiment coverage levels, lower Cas9 activity, 3' readout
data("resultsK562_lowest_extracted_output") #25x-30x experiment coverage levels, lower Cas9 activity, 3' readout
data("resultsK562_highCas9_extracted_output")#higher Cas9 activity, 134x coverage, 3' and 5' readout
data("resultsK562_highest_3_5_extracted_output")
First, we look at the gold standard screen
extracted_gene_categories_K562_best <- extract_gene_categories(resultsK562_highCas9_extracted_output)
extracted_gene_categories_K562_best$proportion_core_essential_in_essential_II
## [1] 0.799435
extracted_gene_categories_K562_best$proportion_core_essential_in_essential_I
## [1] 0.8926554
extracted_gene_categories_K562_best$proportion_non_essential_genes_in_essential_II
## [1] 0.01330377
extracted_gene_categories_K562_best$proportion_non_essential_genes_in_essential_I
## [1] 0.03436807
extracted_gene_categories_K562_best$score
## [1] 0.8103922
Now we check how well genes of different essentiality types are retained with the lower Cas9 levels and only 3' readout (but highest experiment coverage):
extracted_gene_categories_K562_highest<- extract_gene_categories(resultsK562_highest_extracted_output)
extracted_gene_categories_K562_highest$proportion_core_essential_in_essential_II
## [1] 0.5451977
extracted_gene_categories_K562_highest$proportion_core_essential_in_essential_I
## [1] 0.6271186
extracted_gene_categories_K562_highest$proportion_non_essential_genes_in_essential_II
## [1] 0.03880266
extracted_gene_categories_K562_highest$proportion_non_essential_genes_in_essential_I
## [1] 0.05432373
extracted_gene_categories_K562_highest$score
## [1] 0.5304016
We can directly compare the sets of essentiality types I and II found from the two screens:
comparison_Cas9_K562 <- compare_to_exisiting_screen(resultsK562_highCas9_extracted_output,resultsK562_highest_extracted_output)
comparison_Cas9_K562
## $proportion_genes_recovered
## [1] 0.7640449
proportion_genes_recovered is the proportion of essential genes of types I and II from the gold standard set (FDR 0.1 for both types) that are also contained in the corresponding set for the new screen. The closer to one, the better the two screens agree in identifying genes at different essentiality types.
Here, a majority of the genes are recovered.
Now we check how well genes of different essentiality types are retained with the lower Cas9 levels if we use both the 3' and 5' readout:
comparison_Cas9_K562_B <- compare_to_exisiting_screen(resultsK562_highCas9_extracted_output,resultsK562_highest_3_5_extracted_output)
comparison_Cas9_K562_B
## $proportion_genes_recovered
## [1] 0.8216292
We see that the proportion of recovered genes has increased.
``` This shows that by including both 3' and 5' readout, the genes set obtained are closer to the ones obtained by using the data with the higher Cas9 levels (and both 3' and 5' readout).
For screens of lower Cas9 levels, 3' readout, and different levels of experiment coverage compare the lengths of the credible intervals pairwise.
comparison_highest_lowest <- compare_length_of_credible_intervals(resultsK562_highest_extracted_output,resultsK562_lowest_extracted_output, "160x-180x","25x-30x")
comparison_highest_high <- compare_length_of_credible_intervals(resultsK562_highest_extracted_output,resultsK562_high_extracted_output, "160x-180x","120x-130x")
The figures above suggest 120x-130x is the optimal coverage, as credible intervals are already comparable to those of the 160x-180x screen.
To compare essentiality types in different cell lines (here K562 from Peets et al. (2019) and HT29 from Allen et al. (2019)) robustly, we compare sets of essential genes of levels I and II (with significance levels as detailed above) and sets of unessential genes of levels I and II (unessential with the same significance levels), and determine if there are overlaps between significantly essential genes in one cell line and significantly unessential genes in the other.
extracted_gene_categories_K562_best_strict <- extract_gene_categories(resultsK562_highCas9_extracted_output,0.001)
extracted_gene_categories_K562_best_strict$proportion_core_essential_in_essential_II
## [1] 0.7514124
extracted_gene_categories_K562_best_strict$proportion_core_essential_in_essential_I
## [1] 0.2740113
extracted_gene_categories_K562_best_strict$proportion_non_essential_genes_in_essential_II
## [1] 0.01219512
extracted_gene_categories_K562_best_strict$proportion_non_essential_genes_in_essential_I
## [1] 0
extracted_gene_categories_K562_best_strict$score
## [1] 0.4537571
intersect(extracted_gene_categories_K562_best_strict$not_essential_genes_II,extracted_gene_categories_HT29_strict$essential_genes_II)
## [1] "ATP2A2" "ABCE1" "TUBG1" "SEC61A1" "EIF1AX" "NARFL"
## [7] "PSMB6" "PSMB5" "PSMB4" "PSMB3" "BRF2" "IPO11"
## [13] "IARS" "DDX21" "RPL18A" "PCNA" "TPX2" "PRPF38B"
## [19] "NPIPA5" "VPS28" "DYNC1I2" "KIF23" "XRCC6" "GTPBP4"
## [25] "PLK1" "VCP" "HAUS6" "SNRPD3" "SNRPD1" "ISCU"
## [31] "SMU1" "PSMD14" "EEF1A1" "CCT6A" "LRRC37A3" "CPSF3L"
## [37] "LSM5" "RPS3" "RPL34" "RPL32" "PPWD1" "SNRNP27"
## [43] "SARS" "ERH" "UBA52" "PMF1" "SNRNP200" "CRNKL1"
## [49] "CSE1L" "HINFP" "DBR1" "RRM1" "CCT8" "CCT2"
## [55] "CCT3" "CCT4" "DCTN6" "RAN" "RPL27A" "NCBP1"
## [61] "LARS" "NSA2" "PSMC5" "PSMC6" "RPL23A" "LUC7L3"
## [67] "MCM5" "PKMYT1" "NEDD1" "NUTF2" "SNRPA1" "UTP15"
## [73] "RPS23" "RPS20" "RPS28" "LRR1" "TUT1" "ATP6V1E1"
## [79] "RAB18" "UBE2I" "LSM2" "RPAP1" "HNRNPC" "SRP54"
## [85] "AHCTF1" "XAB2" "POLR3F" "POLR3B" "POLR3A" "HMGCS1"
## [91] "MED14" "RPL8" "RPL6" "RPL3" "SCD" "RPL7"
## [97] "RPL5" "WDR43" "SDE2" "DNAJC19" "SUPT16H" "PRPF31"
## [103] "RPS15A" "HARS" "QARS" "DHX15" "RACGAP1" "MPHOSPH10"
## [109] "YARS" "UBL5" "DTL" "ATP6V1H" "SAP30BP" "PSMD6"
## [115] "RRN3" "RABGGTB" "YEATS4" "DONSON" "RHEB" "SACM1L"
## [121] "TIMM23" "NCBP2" "EIF2B2" "CDC5L" "SBDS" "TARS"
## [127] "PSMA6" "SRSF7" "SRSF3" "PUF60" "WEE1" "H3F3A"
## [133] "SEC61G" "CARS" "GGPS1" "CDC16" "RPS19" "RPA1"
## [139] "RPA3" "NUP88" "SNRPF" "SNRPG" "SNRPE" "CBWD2"
## [145] "AURKB" "NARS" "GSPT1" "RPS12" "RPS11" "RPS15"
## [151] "RPS14" "RPS18" "CLP1" "WDR33" "KARS" "ACTR10"
## [157] "SSU72" "POLR2G" "POLR2F" "POLR2A" "POLR2B" "POLR2H"
## [163] "POLR2K" "CCDC94" "RPL15" "RPL17" "RPL11" "RPL19"
## [169] "TOP3A" "TXNL4A" "TXNL4B" "NSF" "DHDDS" "SLC9B1"
## [175] "MAK16" "SF1" "PAFAH1B1" "SF3A1" "SF3A3" "NOP58"
## [181] "USP39" "NUF2" "ATP6V0C" "DUT" "GMPS" "RPL37A"
## [187] "RCC1" "THOC7" "ECD" "TCEB1" "GINS2" "GINS1"
## [193] "GINS4" "PSMA2" "PSMA3" "PSMA1" "PSMA5" "PPP2CA"
## [199] "COPA" "EIF3F" "GTF2E1" "DAD1" "SASS6" "MIS12"
## [205] "CDC27" "CDC23" "RPF2" "NCAPD2" "ALG11" "DDX56"
## [211] "ANAPC10" "MYC" "SPC25" "RPS27A" "DYNLRB1" "NUDT21"
## [217] "EIF4A3" "KIF11" "BUD31" "ACTL6A" "HSPA9" "RPS4X"
## [223] "SLMO2" "NRDE2" "CDK1" "FAM133B" "RPS6" "RPS8"
## [229] "CDK7" "RPSA" "RPL24" "RPL26" "RPL21" "RPL23"
## [235] "PRPF19" "SHFM1" "POLE2" "GTF2B" "HSPE1" "DHFR"
## [241] "SF3B1" "SF3B2" "LSM4" "FARSB" "PHF5A" "TOP2A"
## [247] "RPL31" "SPC24" "CENPN" "CENPK" "RBM39" "EIF4E"
## [253] "INTS4" "INTS9" "SCAP" "RPS25" "RPS21" "RPS29"
extracted_gene_categories_K562_best$false_positive_rate_not_essential_II
## [1] 0.005008661
extracted_gene_categories_HT29$false_positive_rate_essential_II
## [1] 0.050067
intersect(intersect(extracted_gene_categories_K562_best_strict$essential_genes_I,extracted_gene_categories_K562_best_strict$essential_genes_II),intersect(extracted_gene_categories_HT29_strict$not_essential_genes_I,extracted_gene_categories_HT29_strict$not_essential_genes_II))
## [1] "ABCE1" "TUBG1" "PSMB5" "IARS" "RPL18A" "NPIPA5"
## [7] "PLK1" "VCP" "SNRPD1" "SMU1" "ANAPC1" "EEF1A1"
## [13] "CCT6A" "RPL34" "RPL32" "SNRNP27" "SNRNP200" "CSE1L"
## [19] "DBR1" "CCT4" "RPL27A" "RPLP2" "RPL23A" "NEDD1"
## [25] "NUTF2" "RPS23" "LRR1" "SRP54" "POLR3A" "RPL8"
## [31] "RPL6" "RPL7" "SUPT16H" "YARS" "UBL5" "SAP30BP"
## [37] "PSMD6" "NCBP2" "EIF2B2" "PSMA6" "SRSF7" "SRSF3"
## [43] "PUF60" "CARS" "RPS19" "SNRPF" "SNRPG" "SNRPE"
## [49] "ESPL1" "EEF2" "RPS11" "RPS16" "RPS18" "CLP1"
## [55] "KARS" "POLR2F" "POLR2A" "POLR2B" "RPL15" "RPL11"
## [61] "TOP3A" "NSF" "MAK16" "SF1" "NAE1" "NOP58"
## [67] "NUF2" "DUT" "GMPS" "ECD" "PSMA2" "PSMA1"
## [73] "PSMA5" "COPA" "DAD1" "MIS12" "CDC27" "CDC23"
## [79] "RPF2" "MYC" "DYNLRB1" "NUDT21" "EIF4A3" "KIF11"
## [85] "RPS4X" "CDK1" "CDK7" "RPSA" "RPL26" "RPL21"
## [91] "LSM4" "PHF5A" "TOP2A" "SPC24" "CENPK" "RPS25"
intersect(extracted_gene_categories_K562_best_strict$essential_genes_I,extracted_gene_categories_HT29_strict$not_essential_genes_I)
## [1] "ABCE1" "TUBG1" "PSMB5" "IARS" "RPL18A" "NPIPA5"
## [7] "PLK1" "VCP" "CPSF3" "SNRPD1" "SMU1" "ANAPC1"
## [13] "EEF1A1" "CCT6A" "RPL34" "RPL32" "SNRNP27" "SNRNP200"
## [19] "CSE1L" "DBR1" "NAA50" "CCT4" "RPL27A" "RPLP2"
## [25] "SNW1" "RPL23A" "NEDD1" "NUTF2" "RPS23" "LRR1"
## [31] "CDC123" "SRP54" "POLR3A" "RPL8" "RPL6" "RPL7"
## [37] "SUPT16H" "YARS" "UBL5" "SAP30BP" "PSMD6" "NCBP2"
## [43] "EIF2B2" "PSMA6" "SRSF7" "SRSF3" "PUF60" "CARS"
## [49] "RPS19" "SNRPF" "SNRPG" "SNRPE" "ESPL1" "EEF2"
## [55] "RPS11" "RPS16" "RPS18" "CLP1" "KARS" "POLR2F"
## [61] "POLR2A" "POLR2B" "RPL15" "RPL11" "TOP3A" "NSF"
## [67] "NIFK" "MAK16" "SF1" "NAE1" "NOP58" "NUF2"
## [73] "DUT" "GMPS" "NOP16" "ECD" "PSMA2" "PSMA1"
## [79] "PSMA5" "COPA" "DAD1" "MIS12" "CDC27" "CDC23"
## [85] "RPF2" "MYC" "DYNLRB1" "NUDT21" "EIF4A3" "KIF11"
## [91] "MED4" "RPS4X" "CDK1" "CDK7" "RPSA" "RPL26"
## [97] "RPL21" "LSM4" "PHF5A" "TOP2A" "SPC24" "CENPK"
## [103] "RPS25"
intersect(extracted_gene_categories_K562_best_strict$not_essential_genes_II,extracted_gene_categories_HT29_strict$essential_genes_II)
## [1] "ATP2A2" "ABCE1" "TUBG1" "SEC61A1" "EIF1AX" "NARFL"
## [7] "PSMB6" "PSMB5" "PSMB4" "PSMB3" "BRF2" "IPO11"
## [13] "IARS" "DDX21" "RPL18A" "PCNA" "TPX2" "PRPF38B"
## [19] "NPIPA5" "VPS28" "DYNC1I2" "KIF23" "XRCC6" "GTPBP4"
## [25] "PLK1" "VCP" "HAUS6" "SNRPD3" "SNRPD1" "ISCU"
## [31] "SMU1" "PSMD14" "EEF1A1" "CCT6A" "LRRC37A3" "CPSF3L"
## [37] "LSM5" "RPS3" "RPL34" "RPL32" "PPWD1" "SNRNP27"
## [43] "SARS" "ERH" "UBA52" "PMF1" "SNRNP200" "CRNKL1"
## [49] "CSE1L" "HINFP" "DBR1" "RRM1" "CCT8" "CCT2"
## [55] "CCT3" "CCT4" "DCTN6" "RAN" "RPL27A" "NCBP1"
## [61] "LARS" "NSA2" "PSMC5" "PSMC6" "RPL23A" "LUC7L3"
## [67] "MCM5" "PKMYT1" "NEDD1" "NUTF2" "SNRPA1" "UTP15"
## [73] "RPS23" "RPS20" "RPS28" "LRR1" "TUT1" "ATP6V1E1"
## [79] "RAB18" "UBE2I" "LSM2" "RPAP1" "HNRNPC" "SRP54"
## [85] "AHCTF1" "XAB2" "POLR3F" "POLR3B" "POLR3A" "HMGCS1"
## [91] "MED14" "RPL8" "RPL6" "RPL3" "SCD" "RPL7"
## [97] "RPL5" "WDR43" "SDE2" "DNAJC19" "SUPT16H" "PRPF31"
## [103] "RPS15A" "HARS" "QARS" "DHX15" "RACGAP1" "MPHOSPH10"
## [109] "YARS" "UBL5" "DTL" "ATP6V1H" "SAP30BP" "PSMD6"
## [115] "RRN3" "RABGGTB" "YEATS4" "DONSON" "RHEB" "SACM1L"
## [121] "TIMM23" "NCBP2" "EIF2B2" "CDC5L" "SBDS" "TARS"
## [127] "PSMA6" "SRSF7" "SRSF3" "PUF60" "WEE1" "H3F3A"
## [133] "SEC61G" "CARS" "GGPS1" "CDC16" "RPS19" "RPA1"
## [139] "RPA3" "NUP88" "SNRPF" "SNRPG" "SNRPE" "CBWD2"
## [145] "AURKB" "NARS" "GSPT1" "RPS12" "RPS11" "RPS15"
## [151] "RPS14" "RPS18" "CLP1" "WDR33" "KARS" "ACTR10"
## [157] "SSU72" "POLR2G" "POLR2F" "POLR2A" "POLR2B" "POLR2H"
## [163] "POLR2K" "CCDC94" "RPL15" "RPL17" "RPL11" "RPL19"
## [169] "TOP3A" "TXNL4A" "TXNL4B" "NSF" "DHDDS" "SLC9B1"
## [175] "MAK16" "SF1" "PAFAH1B1" "SF3A1" "SF3A3" "NOP58"
## [181] "USP39" "NUF2" "ATP6V0C" "DUT" "GMPS" "RPL37A"
## [187] "RCC1" "THOC7" "ECD" "TCEB1" "GINS2" "GINS1"
## [193] "GINS4" "PSMA2" "PSMA3" "PSMA1" "PSMA5" "PPP2CA"
## [199] "COPA" "EIF3F" "GTF2E1" "DAD1" "SASS6" "MIS12"
## [205] "CDC27" "CDC23" "RPF2" "NCAPD2" "ALG11" "DDX56"
## [211] "ANAPC10" "MYC" "SPC25" "RPS27A" "DYNLRB1" "NUDT21"
## [217] "EIF4A3" "KIF11" "BUD31" "ACTL6A" "HSPA9" "RPS4X"
## [223] "SLMO2" "NRDE2" "CDK1" "FAM133B" "RPS6" "RPS8"
## [229] "CDK7" "RPSA" "RPL24" "RPL26" "RPL21" "RPL23"
## [235] "PRPF19" "SHFM1" "POLE2" "GTF2B" "HSPE1" "DHFR"
## [241] "SF3B1" "SF3B2" "LSM4" "FARSB" "PHF5A" "TOP2A"
## [247] "RPL31" "SPC24" "CENPN" "CENPK" "RBM39" "EIF4E"
## [253] "INTS4" "INTS9" "SCAP" "RPS25" "RPS21" "RPS29"
intersect(extracted_gene_categories_K562_best_strict$essential_genes_II,extracted_gene_categories_HT29_strict$not_essential_genes_II)
## [1] "ATP2A2" "ABCE1" "TUBG1" "SEC61A1" "EIF1AX"
## [6] "NARFL" "PSMB6" "PSMB5" "PSMB4" "PSMB3"
## [11] "BRF2" "IPO11" "IARS" "DDX21" "RPL18A"
## [16] "PCNA" "TPX2" "PRPF38B" "NPIPA5" "VPS28"
## [21] "DYNC1I2" "KIF23" "XRCC6" "GTPBP4" "PLK1"
## [26] "VCP" "HAUS6" "SNRPD3" "SNRPD1" "ISCU"
## [31] "SMU1" "ANAPC1" "PSMD14" "EEF1A1" "CCT6A"
## [36] "LRRC37A3" "CPSF3L" "LSM5" "RPS3" "RPL34"
## [41] "RPL32" "PPWD1" "SNRNP27" "SARS" "ERH"
## [46] "UBA52" "PMF1" "SNRNP200" "CRNKL1" "CSE1L"
## [51] "HINFP" "DBR1" "RRM1" "CCT8" "CCT2"
## [56] "CCT3" "CCT4" "DCTN6" "PMPCB" "RAN"
## [61] "RPL27A" "NCBP1" "CDC45" "LARS" "RPLP2"
## [66] "NSA2" "PSMC5" "PSMC6" "RPL23A" "LUC7L3"
## [71] "MCM5" "PKMYT1" "NEDD1" "NUTF2" "SNRPA1"
## [76] "UTP15" "RPS23" "RPS20" "RPS28" "LRR1"
## [81] "TUT1" "ATP6V1E1" "RAB18" "UBE2I" "LSM2"
## [86] "RPAP1" "HNRNPC" "SRP54" "AHCTF1" "XAB2"
## [91] "POLR3F" "POLR3B" "POLR3A" "HSPE1-MOB4" "HMGCS1"
## [96] "MED14" "RPL8" "RPL6" "RPL3" "SCD"
## [101] "RPL7" "RPL5" "WDR43" "SDE2" "DNAJC19"
## [106] "SUPT16H" "PRPF31" "RPS15A" "HARS" "QARS"
## [111] "DHX15" "RACGAP1" "MPHOSPH10" "YARS" "UBL5"
## [116] "DTL" "ATP6V1H" "SAP30BP" "PSMD6" "RRN3"
## [121] "RABGGTB" "YEATS4" "DONSON" "RHEB" "SACM1L"
## [126] "TIMM23" "NCBP2" "EIF2B2" "CDC5L" "SBDS"
## [131] "TARS" "PSMA6" "SRSF7" "SRSF3" "PUF60"
## [136] "WEE1" "H3F3A" "SEC61G" "CARS" "GGPS1"
## [141] "CDC16" "RPS19" "RPA1" "RPA3" "NUP88"
## [146] "SNRPF" "SNRPG" "SNRPE" "ESPL1" "CBWD2"
## [151] "EEF2" "AURKB" "CDK9" "NARS" "GSPT1"
## [156] "RPS12" "RPS11" "RPS16" "RPS15" "RPS14"
## [161] "RPS18" "CLP1" "WDR33" "KARS" "ACTR10"
## [166] "PRPF8" "SSU72" "POLR2G" "POLR2F" "POLR2A"
## [171] "POLR2B" "POLR2H" "POLR2K" "SKP1" "CCDC94"
## [176] "RPL15" "RPL17" "RPL11" "RPL12" "RPL19"
## [181] "PCYT1A" "TOP3A" "TXNL4A" "TXNL4B" "NSF"
## [186] "DHDDS" "SLC9B1" "MAK16" "SF1" "PAFAH1B1"
## [191] "SF3A1" "SF3A3" "NAE1" "NOP58" "USP39"
## [196] "NUF2" "ORAOV1" "ATP6V0C" "DUT" "GMPS"
## [201] "RPL37A" "RCC1" "THOC7" "ECD" "TCEB1"
## [206] "GINS2" "GINS1" "GINS4" "PSMA2" "PSMA3"
## [211] "PSMA1" "PSMA5" "PPP2CA" "COPA" "EIF3F"
## [216] "GTF2E1" "TARDBP" "DAD1" "SASS6" "MIS12"
## [221] "CDC27" "CDC23" "RPF2" "NCAPD2" "ALG11"
## [226] "DDX56" "ANAPC10" "MYC" "SPC25" "RPS27A"
## [231] "DYNLRB1" "NUDT21" "EIF4A3" "KIF11" "BUD31"
## [236] "HJURP" "ACTL6A" "HSPA9" "RPS4X" "SLMO2"
## [241] "NRDE2" "CDK1" "FAM133B" "RPS6" "RPS8"
## [246] "CDK7" "RPSA" "RPL24" "RPL26" "RPL21"
## [251] "RPL23" "PRPF19" "SHFM1" "POLE2" "GTF2B"
## [256] "HSPE1" "DHFR" "SF3B1" "SF3B2" "LSM4"
## [261] "FARSB" "STX18" "PHF5A" "TOP2A" "RPL31"
## [266] "SPC24" "CENPN" "CENPK" "RBM39" "EIF4E"
## [271] "INTS4" "INTS9" "SCAP" "RPS25" "RPS21"
## [276] "RPS29"
Now we have obtained a list of genes confidently differentially essential in the two cell lines.
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BSure_0.0.0.9000 knitr_1.30 BiocStyle_2.16.1
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.1.2 viridisLite_0.3.0 jsonlite_1.7.1
## [4] carData_3.0-4 RcppParallel_5.0.2 StanHeaders_2.21.0-6
## [7] assertthat_0.2.1 BiocManager_1.30.10 stats4_4.0.2
## [10] cellranger_1.1.0 yaml_2.2.1 pillar_1.4.6
## [13] backports_1.1.10 glue_1.4.2 digest_0.6.25
## [16] polyclip_1.10-0 ggsignif_0.6.0 colorspace_1.4-1
## [19] eulerr_6.1.0 cowplot_1.1.0 htmltools_0.5.0
## [22] pkgconfig_2.0.3 rstan_2.21.2 broom_0.7.1
## [25] haven_2.3.1 bookdown_0.20 purrr_0.3.4
## [28] scales_1.1.1 processx_3.4.4 openxlsx_4.2.2
## [31] rootSolve_1.8.2.1 rio_0.5.16 tibble_3.0.3
## [34] generics_0.0.2 farver_2.0.3 car_3.0-10
## [37] ggplot2_3.3.2 ellipsis_0.3.1 ggpubr_0.4.0
## [40] withr_2.3.0 cli_2.0.2 magrittr_1.5
## [43] crayon_1.3.4 readxl_1.3.1 evaluate_0.14
## [46] ps_1.3.4 fansi_0.4.1 MASS_7.3-51.6
## [49] rstatix_0.6.0 forcats_0.5.0 foreign_0.8-80
## [52] pkgbuild_1.1.0 ggthemes_4.2.0 tools_4.0.2
## [55] loo_2.3.1 data.table_1.13.0 prettyunits_1.1.1
## [58] hms_0.5.3 lifecycle_0.2.0 matrixStats_0.57.0
## [61] stringr_1.4.0 V8_3.2.0 munsell_0.5.0
## [64] zip_2.1.1 callr_3.4.4 compiler_4.0.2
## [67] rlang_0.4.7 grid_4.0.2 labeling_0.3
## [70] rmarkdown_2.4 gtable_0.3.0 codetools_0.2-16
## [73] inline_0.3.16 abind_1.4-5 curl_4.3
## [76] R6_2.4.1 gridExtra_2.3 dplyr_1.0.2
## [79] polylabelr_0.2.0 stringi_1.5.3 parallel_4.0.2
## [82] Rcpp_1.0.5 vctrs_0.3.4 tidyselect_1.1.0
## [85] xfun_0.18
Allen, Felicity, Fiona Behan, Anton Khodak, Francesco Iorio, Kosuke Yusa, Mathew Garnett, and Leopold Parts. 2019. “JACKS: joint analysis of CRISPR/Cas9 knockout screens.” Genome Research 29 (3). Cold Spring Harbor Laboratory Press: 464–71. doi:10.1101/gr.238923.118.
Behan, Fiona M., Francesco Iorio, Gabriele Picco, Emanuel Gonçalves, Charlotte M. Beaver, Giorgia Migliardi, Rita Santos, et al. 2019. “Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens.” Nature 568 (7753). Nature Publishing Group: 511–16. doi:10.1038/s41586-019-1103-9.
Hart, Traver, Kevin R Brown, Fabrice Sircoulomb, Robert Rottapel, and Jason Moffat. 2014. “Measuring error rates in genomic perturbation screens: gold standards for human functional genomics.” Molecular Systems Biology 10 (7). EMBO: 733. doi:10.15252/msb.20145216.
Hart, Traver, Amy Hin Yan Tong, Katie Chan, Jolanda Van Leeuwen, Ashwin Seetharaman, Michael Aregger, Megha Chandrashekhar, et al. 2017. “Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens.” G3: Genes, Genomes, Genetics 7 (8). Genetics Society of America: 2719–27. doi:10.1534/g3.117.041277.
Peets, Elin Madli, Luca Crepaldi, Yan Zhou, Felicity Allen, Rasa Elmentaite, Guillaume Noell, Gemma Turner, Vivek Iyer, and Leopold Parts. 2019. “Minimized double guide RNA libraries enable scale-limited CRISPR/Cas9 screens.” bioRxiv, November. Cold Spring Harbor Laboratory, 859652. doi:10.1101/859652.
Strauss, ME, and L Parts. 2020. In Preparation.